Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to drop partial buckets from date_histogram visuals #19979

Merged
merged 8 commits into from
Aug 30, 2018

Conversation

blfrantz
Copy link
Contributor

@blfrantz blfrantz commented Jun 18, 2018

Closes #2806.

Note: description has changed since the original version to reflect a new implementation.

Date Histograms often begin and/or end with incomplete buckets, which makes the beginning/end of time-series charts appear to show steep up/down trends which can be misleading or alarming. This feature adds a new "Drop partial buckets" option for the Date Histogram aggregation. It only appears/applies when the chosen field is the same as the index's Time Filter field (that's the only case where this feature makes sense).

drop_bucket

When selected, any buckets which span more time than is covered by the query's Time Range will be removed from the chart.

@elasticmachine
Copy link
Contributor

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@timroes timroes added Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) labels Jun 18, 2018
@timroes timroes requested review from ppisljar and timroes June 18, 2018 07:01
@ppisljar
Copy link
Member

thanks a lot for your contribution @blfrantz !

As you mention this is something aggregation specific. Adding it to handler.js means it will only work with pie, area, line, bar, heatmap and gauge charts. It won't with all the others (maps, metric, tagcloud, more to come, any 3rd party visualizations).

I would suggest implementing this as an aggregation option. The date histogram aggregation config and logic is in ui/agg_types/buckets/date_histogram.js. This way this option can be completely independent from the visualization.

The actual conversion of data happens in tabify response handler, this is probably where you should check for this option and skip partial buckets. ui/vis/response_handlers/tabify.js and basic.js in same folder use the tabify functions defined in ui/agg_response/tabify.

let me know what you think and if this is helpful.

@blfrantz
Copy link
Contributor Author

Thanks @ppisljar! Sounds like good advice. I like the idea of putting this with the date_histogram aggregation instead of the axis config. I was thinking this was more of an axis display option which led me to my implementation, but yours should be a lot cleaner. I'll work on this and update the PR shortly (doing this in my free time so it may be a few days).

@blfrantz
Copy link
Contributor Author

blfrantz commented Jun 23, 2018

Hey @ppisljar, now that I've gotten into it I have a question: is it possible to get the time range covered by the query from within the response_handlers? It seems like this information should be knowable since the exact timestamps are specified in the "range" portion of the _msearch query, but I can't find where to get this from the vis object. My issue is that without knowing the range of time included in the query, I can't know for sure which buckets are indeed "partial." I don't want to just assume the first/last ones are and drop them every time since absolute time ranges could be expected to align with the buckets and not have any partials.

The nice thing about my first approach, ugly as it was, is it had access to this range info and could use the same logic as the "this area contains partial data" tooltip (which incidentally also made it easy to extend that tooltip to say something more relevant in this case).

Any suggestions? I do like the genericism of your suggestion, but it's not clear to me where to get this range information.

Edit: One other nice thing about editing the data in handler.js and treating this as a view option rather than an aggregation option, is we don't need to requery when toggling this setting. But that's minor.

@blfrantz
Copy link
Contributor Author

@ppisljar, any thoughts? Thanks!

@ppisljar
Copy link
Member

sorry for late reply.

in your response handler you can access the timefilter on vis.filters.timeRange

preferably you would be able to figure this out of elastic search response, but i am not sure if that is possible.

@blfrantz
Copy link
Contributor Author

No problem, @ppisljar .

So in the relative search case, timeRange only contains something like {from: "now-15m", to: "now", mode: "quick"}. Without knowing when the query fired, that's not really useful (and even if I did, it'd be ugly to calculate). As for the ES response, the data returned by the _msearch query doesn't appear to have any timestamp info aside from the bucket times, but those don't necessarily align with the queried range (if they did, this whole feature would be unnecessary).

Unless this information is available somewhere I'm not aware of (or not that hard to pass in), I'm beginning to think my original approach may be more realistic. In the same way that the "partial data" tooltip only shows up for some types of visuals (not all where it could be relevant), I'm wondering if having this feature exist only for some of the relevant cases (arguably the ones that matter most) is enough for now? What do you think? If you're ok with that approach, can you provide feedback on my 4th question (about test frameworks)?

Thanks!

@ppisljar
Copy link
Member

import { getTime } from 'ui/timefilter/get_time';

const parsedTime = getTime(vis.indexPattern, vis.filters.timeRange);

let me know if this helps

@blfrantz blfrantz changed the title Add option to drop partial buckets from data_histogram visuals Add option to drop partial buckets from date_histogram visuals Jul 1, 2018
@blfrantz
Copy link
Contributor Author

blfrantz commented Jul 1, 2018

@ppisljar thanks for that tip, it worked great. I've updated the PR with the new implementation, let me know what you think.

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @blfrantz !

i think this will work much better (specially with our future plans).

i have one more suggestion. Generally i think we are always talking about first and last bucket. Would it make sense to let tabify do its job (convert to table) and then check just some rows and remove those in something like a postprocessing step on tabify ? this way we wouldn't need to pass time information all the way down to TabifyBuckets and we could keep this code more isolated.

@@ -81,4 +82,20 @@ TabifyBuckets.prototype._orderBucketsAccordingToParams = function (params) {
}
};

TabifyBuckets.prototype._dropPartials = function (params, timeRange) {
if (params.drop_partials && !this.objectMode && this.buckets.length > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than wrapping the whole thing in an if statement return early:

if (!params.drop_partials || this.objectMode || this.buckets.length <= 1) { return; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move the if (params.drop_partials) out of this function ?

also join with below if statement ... and extract to variables so its easier to read:

 const isTimeField = params.field.name === timeRange.name;
if (this.objectMode || this.buckets.length <= 1 || !isTimeField) { 
 return;
}

@blfrantz
Copy link
Contributor Author

blfrantz commented Jul 5, 2018

Thanks @ppisljar. That makes sense, and does make the code organization more elegant/intuitive. However I'm finding that it makes the dropPartials function itself rather more complicated and therefore more brittle.

Because of the way the tabified data is structured, things that were simple become a bit complicated in the proposed approach.

In the no-split-charts case, the following works (called with dropPartialRows(write.response(), write.timeRange) toward the end of tabifyAggResponse() in tabify.js:

function dropPartialRows(tableGroup, timeRange) {
  tableGroup.tables.forEach(table => {
    if (table.rows.length < 2) return;

    const aggIndex = _.findIndex(table.columns, {
      'aggConfig': {
        '_opts': {
          'type': 'date_histogram',
          'params': {
            'field': timeRange.name,
            'drop_partials': true
          }
        }
      }
    });

    if (aggIndex !== -1) {
      const time0 = table.rows[0][aggIndex].key;
      const time1 = _.find(table.rows, r => {
        if (r[aggIndex].key !== time0) return true;
        else return false;
      })[aggIndex].key;

      const interval = time1 - time0;

      table.rows = table.rows.filter(arr => {
        if (arr[aggIndex].key < timeRange.gte) return false;
        if (arr[aggIndex].key + interval > timeRange.lte) return false;
        return true;
      });
    }
  });
}

That's not too bad, but some things that could be simple (like determining the interval) get a bit complicated because tabify's rows array contains a separate row for every value on each series, which means there can be multiple rows per time bucket, which means I have to scan the array until I find a new time to determine the interval. Previously, I could just compare the first and second items in the array.

Furthermore, the above code doesn't handle the split charts case (whereas the current PR code does), because in that event tableGroup contains tables of tables, and so I'd need to process this recursively or something. That's not a big deal, but as this gets more complex (and computationally expensive) I wanted to see if you thought this was still the best approach. It also seems harder to test. Apologies if I'm missing something. Thanks!

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the implementation as it is in the current PR seems to be way simpler.

i left some nitty picking comments, feel free to ignore them.

@timroes can you also take a look ?

&nbsp;
<icon-tip
position="'right'"
content="'Removes buckets that include times not covered by the Time Range.'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchaps What would be the best wording we could use here to describe the "Drop partial buckets". Besides the above I thought about something like:

Removes buckets from the beginning and end, that are partially outside the visualization's time range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timroes

Maybe something like this:

Remove buckets that span time outside the time range so the histogram doesn't start and end with incomplete buckets.

@@ -74,7 +75,8 @@ const BasicResponseHandlerProvider = function (Private) {
const tableGroup = aggResponse.tabify(vis.getAggConfig().getResponseAggs(), response, {
canSplit: true,
asAggConfigResults: true,
isHierarchical: vis.isHierarchical()
isHierarchical: vis.isHierarchical(),
timeRange: getTime(vis.indexPattern, vis.filters.timeRange).range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppisljar Since we talked earlier about removing vis.filters, should we introduce a new reference here? Do we have an idea how we remove that in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vis.filters is going away in the long run, but the response handlers will still get the access to the timeRange (probably being passed in directly as a parameter in the future) so this should not be a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blfrantz If you create a visualization without a time field you will get an error for this line because the getTime will return undefined if not time field is specified on the selected indexPattern.

You can verify the error using the shakespeare dataset https://www.elastic.co/guide/en/kibana/current/tutorial-load-dataset.html and just creating a barchart with that index.

@blfrantz
Copy link
Contributor Author

blfrantz commented Jul 9, 2018

@ppisljar Thanks, I've cleaned up the conditionals for _dropPartials, let me know how it looks now. I still need to rebase/retest before this is ready to pull but waiting to see if any further tweaks are needed per @timroes comments.

@blfrantz
Copy link
Contributor Author

@ppisljar & @timroes: I just force-pushed a cleanup of the commit history to remove all the old iterations. As part of this I added the new tooltip text recommended by @gchaps. I also just rebased onto the latest upstream, fixed a couple of my tests that were broken by some recent upstream changes, and sanity tested that the feature still works as expected. Awaiting any further instructions. Thanks!

@blfrantz
Copy link
Contributor Author

@ppisljar & @timroes: I know you guys are busy, but just wanted to make sure this was still on your radar. Thanks!

@timroes
Copy link
Contributor

timroes commented Jul 23, 2018

Jenkins, test this

@elasticmachine
Copy link
Contributor

💔 Build Failed

@blfrantz
Copy link
Contributor Author

@markov00 Thanks for the feedback and tip about that test failure. I've merged and made the requested fixes. I also confirmed that the failing Shakespeare test passes locally now.

@markov00
Copy link
Member

jenkins test this

@elasticmachine
Copy link
Contributor

💔 Build Failed

@markov00
Copy link
Member

jenkins test this

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@markov00
Copy link
Member

jenkins test this, want to be sure it was a flaky CI test failure

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@blfrantz
Copy link
Contributor Author

What's the next step here?

@timroes
Copy link
Contributor

timroes commented Aug 30, 2018

Jenkins, test this - I'll give it a last test run, then will merge it.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@timroes timroes merged commit 25761fb into elastic:master Aug 30, 2018
timroes pushed a commit to timroes/kibana that referenced this pull request Aug 30, 2018
…ic#19979)

* Add Drop Partials option to date histogram agg settings UI

* Add timeRange to aggOpts and parse in _response_writer

* Implement dropPartials method in TabifyBuckets

* Fixed a couple issues

* Fixed issue with undefined timeRange

* Use braces for conditionals
@timroes
Copy link
Contributor

timroes commented Aug 30, 2018

Hi Brian,

thanks a lot for your first contribution to Kibana! I just merged your commit on master and it will be backported to 6.x, so that these changes will be released in 6.5.

Cheers
Tim

timroes added a commit that referenced this pull request Aug 30, 2018
… (#22528)

* Add Drop Partials option to date histogram agg settings UI

* Add timeRange to aggOpts and parse in _response_writer

* Implement dropPartials method in TabifyBuckets

* Fixed a couple issues

* Fixed issue with undefined timeRange

* Use braces for conditionals
@blfrantz
Copy link
Contributor Author

Awesome, thanks everyone!

@LeeDr
Copy link

LeeDr commented Nov 10, 2018

@timroes or @markov00 or @ppisljar can any of you help me understand why I don't see this option here on this 6.5.0 visualization? This one is using ecommerce sample data, and I also tried a new line chart from makelogs data.

image

@ppisljar
Copy link
Member

@LeeDr #25520

@timroes timroes added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Nov 12, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@Benny-Git
Copy link

We just updated to 6.5.2, and I still don't see the checkbox option, even for a newly created visualization.
According to the release notes, #25520 should be included in 6.5.2?

@timroes
Copy link
Contributor

timroes commented Dec 7, 2018

@Benny-Git unfortunately that PR was missing the backport to 6.5 😞 This should be released in a future 6.5 patch release instead, sorry.

@gchaps Could you perhaps remove Fixes option for showing partial buckets #25520 from the 6.5.2 release notes?

@Benny-Git
Copy link

@timroes thanks for the very quick response.
I'll just manually update my few relevant saved objects with "drop_partials": true for now :)

@gchaps
Copy link
Contributor

gchaps commented Dec 7, 2018

@timroes Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Team:Visualizations Visualization editors, elastic-charts and infrastructure v6.5.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants